Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of a Template objects and refactor of matching engines #1527

Closed
wants to merge 30 commits into from

Conversation

yger
Copy link
Collaborator

@yger yger commented Apr 12, 2023

Here are some modifications for the Wobble algorithm, that I'll try to proofread intensively (and maybe steal ideas for the omp :-)). However, when running the algorithm out-of-the-box with default params, on a toy benchmark (32 channels), I do not get results as good as the ones you showed last time (see results below). Is it normal? Should I tweak params? I'll try with Neuropixel-like recordings. For now, all the engine are accepting as input a waveform_extractor. This is convenient, because we could also use the intrinsic sparsity there to factorize this among engines.
I guess we should also have more benchmarks on times, and more unit tests as mentioned in #1514

image

@alejoe91
Copy link
Member

That looks ok for me! I think that the main reason why it was not implemented in the first place was to run this implementation against the original ones, which uses template arrays.

Maybe we can make both options available? Either pass a WaveformExtractor, or templates, nbefore, nafter

@pauladkisson
Copy link
Collaborator

The reason why I used templates as an input instead of the waveform extractor is that the wobble algorithm can support multiple templates per unit, which would be hard (impossible?) to define in a waveform extractor depending on how you want the templates to look for each unit. I'm not totally sure what the use case is to have multiple templates for each unit, but @cwindolf should be able to explain.

I agree that all the template matching algorithms should have the same mandatory inputs (obviously not optional parameters), but I think the more fundamental input is the templates themselves rather than the waveform extractor. In general, one might want to perform template matching with templates from another recording, with some alterations, smoothing etc. So, I think we should define template matching algorithms as starting with the templates as a given rather than extracting them in some way from the waveform extractor.

@cwindolf
Copy link
Collaborator

Yes, there are multiple reasons for not using a waveform extractor

  • Multiple templates per unit (will explain this in detail soon)
  • Templates can be computed using fancy methods other than the default. Taking templates as an input instead of WE decouples template computation and deconvolution, which is important for us
  • What if we want to extract waveforms in a new way which is drift-aware? Current waveform tooling doesn't support this, so again for us it is helpful to decouple

@yger
Copy link
Collaborator Author

yger commented Apr 12, 2023

Ok, as said I can be convinced. The safe option would be to add two options. Either a waveform extractor (and with sparsity methods that could be standardized among engines), or templates with nbefore/nafter. I'll work on that

@yger
Copy link
Collaborator Author

yger commented Apr 13, 2023

Actually, here is what I'll do. For all methods, we will always provide a waveform_extractor (such that nbefore and nafter can be inferred), and a 'sparsity' field, such that if the waveform_extractor is not sparse, it will be sparsified via the core methods of spikeinterface and the params in this 'sparsity' field (this will allow finer comparisons between engines, because currently, each of them has its own ways of sparsifying).

Meanwhile, methods will also have extra fields called templates and sparsity_mask. If these fields are provided, then the templates will be taken from here (bypassing the waveform_extractor), and same for the sparsity_mask. Will it be fine for everybody?

@alejoe91
Copy link
Member

@yger @pauladkisson I think that there is an even better long-term solution, which is to allow to instantiate a WaveformExtractor.from_templates(templates, unit_ids, sorting=None, recording=None). Note that we already have a "waveformless" option for the WaveformExtractor, so this should be a possible extension. Of course it will require a bit more work, so I'm ok with Pierre's proposal for the time being

@yger
Copy link
Collaborator Author

yger commented Apr 13, 2023

Yes, the best would be to have a from_templates() method and thus a single waveform_extractror. With the extra features that such an object should be able to handle several templates per units. But ok for the green light, I'll start with my solution for now

@h-mayorquin
Copy link
Collaborator

@yger

This is convenient, because we could also use the intrinsic sparsity there to factorize this among engines (template matching engines?)

What do you mean by "factorize" above?

My two cents:

I am concerned about coupling.

Don't we have sparsity objects as stand alone to accomplish this?

It seems like an overkill, plus an unnecessary demand for users, to pass an instance of waveform extractor if all we need is:

  • A sparsity object which is a stand alone.
  • Templates
  • Their temporal information (e.g. n_before, n_after)

An intermediate approach would be to have a template class that has the temporal information (which is an attbute of the templates) and possibilty the sparsity as attributes. The WaveformExtractor then can be composed with this TemplateClass to achieve code deduplication:

How would that look?

For people that don't wnat to deal with WaveformExtractor

templates = TemplateClass(templates: array, temporal_information)  (you can wrap this in a function if you don't like class instantiation)
spikes = template_matching_instance(templates=templates) 

And for people that want to use WaveformExtractor:

templates = waveform_extractor_instance.get_templates()
spikes = template_matching_instance(templates=templates) 

This covers the two uses cases, reduces duplication and avoids excessive coupling between this module and the multi-responsability-harder-to-understand WaveformExtractor object. I think the latter point is imporant both for users and developers.

In brief, the current proposal increases even more the responsabilities of WaveformExtractor -reducing cohesion- and creates more coupling among our code. Plus, it forces our users to deal with that object when they don't strictly need it. I think we can do better.

Am I missing something?

@yger
Copy link
Collaborator Author

yger commented Apr 13, 2023

No i guess you are right, I just wanted to make use of existing objects, and because the waveform_extractor has a lot of functions/attributes used by the engines, I thought about reusing it. But we'll see. Now the code is working here with the behavior explained before: if templates/sparsity_mask are manually provided, then fine, they are used by default. If a waveform_extractor is passed, otherwise, templates are obtained from it, and eventually sparsified with the same method for all sorters (that could be specified). This is important, because currently all engines are dealing with their own way to sparsify, such that we can not really compare among themselves
I'll be gone tomorrow for holidays, back on the 24th. We could have a call then to discuss the details, no hurry.

@alejoe91 alejoe91 added the sortingcomponents Related to sortingcomponents module label Apr 18, 2023
@yger
Copy link
Collaborator Author

yger commented Apr 25, 2023

Back in the game! Actually, we can discuss that during next call. The thing is that for some methods, you might want to have a waveform_extractor as an input (as Sam told me). Because sometimes, having individual waveforms to estimate the noise around the centroids might be needed by matching methods. So I don't know. To me, the best solution should then be a methods that will accept a Template object (like the one you mentioned, with templates, (sparsity mask?) and temporal information n_before/n_after), or a WaveformExtractor object that can be sparse or not

@pauladkisson
Copy link
Collaborator

The thing is that for some methods, you might want to have a waveform_extractor as an input (as Sam told me). Because sometimes, having individual waveforms to estimate the noise around the centroids might be needed by matching methods.

I can't find anywhere in tridesclous's matching that uses that information (maybe I'm missing it, or maybe Sam is talking about some other method?). In either case, it should be relatively straightforward to compute that sort of thing outside of the matching method, and then pass it in as a method-specific parameter. That would also add more flexibility to accommodate template matching procedures that use templates unassociated with specific spike trains.

@yger
Copy link
Collaborator Author

yger commented Apr 25, 2023

This is used in Circus (non omp). In fact, for every centroid, we use several snippets of waveforms to estimate the noise and optimize the range of the allowed amplitudes. But since omp is much better, not sure I want to maintain that, just making the point.
As said, I think the best would be to allow every method to work with either a Templates object (with temporal attributes and that can easily be obtained from a WaveformExtractor). Or maybe directly with the waveform_extractor (that can be more generic, also have precomputed noise levels, ...). Let's discuss that later. For now tests are failing for unknown and unclear reasons

@pauladkisson
Copy link
Collaborator

This is used in Circus (non omp).

Thanks. I see that circus uses the waveform snippets in the _optimize_amplitudes() function. It looks like it returns some kind of optimized amplitudes for each unit.

I still think this function could be pretty easily extracted out of the matching and pass the 'optimal amplitudes' into circus's matching alg.

Maybe we could have another file in the matching submodule (or its own submodule?) called extract_templates or curate_templates (or something) to explicitly encode each sorter's preferred way to extract/curate templates from a waveform extractor. Something to think about.

In any case, I look forward to discussing in more detail on Thursday.

@yger yger changed the title Modifications for the Wobble engine Addition of a Template objects and refactor of matching engines Apr 27, 2023
@yger
Copy link
Collaborator Author

yger commented May 24, 2023

@alejoe91 @samuelgarcia I've tried to add a TemplatesDictionary object, feel free to let me know what do you think about it. All the matching methods are now working with it, and the best maybe would be to add to the waveform_extractor object an option to directly export such an object. @pauladkisson the sparsification performed internally by wobble could/should be removed then, for now I've left it there, but the plan would be to perform sparsification in a common framework, before sending templates to the engines

@yger yger marked this pull request as ready for review May 24, 2023 14:12
from spikeinterface.core.job_tools import ChunkRecordingExecutor, fix_job_kwargs
from spikeinterface.core import get_chunk_with_margin


def find_spikes_from_templates(recording, method="naive", method_kwargs={}, extra_outputs=False, **job_kwargs):
class TemplatesDictionary(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should live in core, see #1982

@alejoe91
Copy link
Member

@yger let's do this step by step and close this now.

First the Templates class (#1982) then the rest!

@alejoe91 alejoe91 closed this Sep 13, 2023
@yger yger deleted the wobble_patch branch September 13, 2023 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sortingcomponents Related to sortingcomponents module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants